11 research outputs found

    Interconnect for commodity FPGA clusters: Standardized or customized?

    Get PDF

    General hardware multicasting for fine-grained message-passing architectures

    Get PDF
    Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in some application domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model

    Fast Protection-Domain Crossing in the CHERI Capability-System Architecture

    Get PDF
    Capability Hardware Enhanced RISC Instructions (CHERI) supplement the conventional memory management unit (MMU) with instruction-set architecture (ISA) extensions that implement a capability system model in the address space. CHERI can also underpin a hardware-software object-capability model for scalable application compartmentalization that can mitigate broader classes of attack. This article describes ISA additions to CHERI that support fast protection-domain switching, not only in terms of low cycle count, but also efficient memory sharing with mutual distrust. The authors propose ISA support for sealed capabilities, hardware-assisted checking during protection-domain switching, a lightweight capability flow-control model, and fast register clearing, while retaining the flexibility of a software-defined protection-domain transition model. They validate this approach through a full-system experimental design, including ISA extensions, a field-programmable gate array prototype (implemented in Bluespec SystemVerilog), and a software stack including an OS (based on FreeBSD), compiler (based on LLVM), software compartmentalization model, and open-source applications.This work is part of the CTSRD and MRC2 projects sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contracts FA8750-10-C-0237 and FA8750-11-C-0249. We also acknowledge the Engineering and Physical Sciences Research Council (EPSRC) REMS Programme Grant [EP/K008528/1], the EPSRC Impact Acceleration Account [EP/K503757/1], EPSRC/ARM iCASE studentship [13220009], Microsoft studentship [MRS2011-031], the Isaac Newton Trust, the UK Higher Education Innovation Fund (HEIF), Thales E-Security, and Google, Inc.This is the author accepted manuscript. The final version of the article can be found at: http://ieeexplore.ieee.org/document/7723791
    corecore